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Abstract: Nowadays Data Mining and Machine 
Learning Techniques are used in different areas in daily 
life. It is also used in the meteorological data processing 
and daily weather predictions. In this paper we used K- 
Nearest Neighbor algorithm to predict the Cyclone 
Storm. K-Nearest Neighbor is a good Classifier which 
classifies the data into different stages of cyclone storm. 
Here we have used three parameters Estimated Central 
Pressure, Maximum Sustained Surface Wind and 
pressure drop to decide the class of the storm .Here I 
used five years of storm data starting from 2001 to 2005 
for the prediction. It classified the data of five classes 
Depression, Deep Depression, Cyclone Storm, Severe 
Cyclone Storm, and Very Severe Cyclone Storm and 
given the results with 88% accuracy, and 12% we have 
the misinterpreted or misclassified data. 

General Terms: Classifier, Cyclone Storm, RSRW 


Keywords: Data Mining, K-Nearest Neighbor, Cyclone 
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1. INTRODUCTION 


Data Mining is one of the vast areas which is used in 
different domains like Medical, Games, Business, 
Entertainment and so on. Here in this paper we have 
used data mining technique to weather prediction. 
Weather Prediction includes daily weather 
parameters like rainfall, temperature, wind speed etc, 
and also severe weather prediction like storm and 
flood prediction. By this we can avoid the 
economical and human lose in severe weather like 
disaster, flood and storm. 


Tropical Cyclones (TCs) lead to potentially severe 
coastal flooding through wind surge and also through 
rainfall Runoff processes. There is growing interest 
in modeling these processes simultaneously.[1] 
Increasing intensity of hurricanes, extreme droughts, 
floods, and rising sea levels might occur due to rising 
temperature during the warming phase of the climate 
change. Especially considering that our planet will 
reach 9 billion inhabitants by mid-century, the 
associated social, economic, and environmental 
impacts are enormous. Chaunté W [2] tried to predict 
the Tropical Cyclones (TCs) using change in the 
Cloud Clusters (CCs). Prior studies have attempted to 
predict tropical cyclogenesis (TCG) using numerical 
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weather prediction models, satellite and radar data 
but this is not a prediction study. The goal of this 
research is to objectively obtain actual locations of 
CCs, extract features to provide more information 
regarding each CC, and distinguish between 
developing and non-developing CCs based on the 
extracted features. 


In this present work, K- nearest neighbor (K-NN) 
model is used for the prediction purpose. K-NN is an 
important pattern recognition technique in soft 
computing. Sharma et al. [8] forecasted storms using 
soft computing method. Chakrabarty 1 et al. [4] 
nowcasted severe storms using K-NN models having 
different values of K. K-nearest neighbor (K-NN) is 
one of the best data mining algorithms for 
classification, which is used in different applications. 
K-NN algorithm was originally suggested by Cover 
in 1968. This algorithm operation is based on 
comparing a given testing data point with training 
data points and finding the training data points 
(neighbors) that are similar to it, and then predict the 
class label of these neighbors [10]. K-nearest 
neighbor algorithm is a non-parametric method for 
classifying objects based on closest training examples 
in the feature space. It is a type of instance-based 
learning, or lazy learning where the function is only 
approximated locally and all computation is deferred 
until classification. K-NN technique is applied by Li 
et al. [11], to forecast solar flare. Brath et al. [12] and 
Jayawardena et al. [13] applied K-NN method for 
flood forecasting. Jan et al. [14] used data mining 
technique for the seasonal to Inter- Annual Climate 
prediction. Bankert and Tag [2002] used a set of 
characteristic features to define a TC in an SSM/I 
(Special Sensor Microwave/Imager) image and then 
used K-Nearest Neighbor (K-NN) algorithm to match 
these features with historical images of tropical 
cyclones to estimate the intensity. [15] 
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In this paper we have proposed a paper to predict the 
cyclone storm and its class by using Data Mining 
technique. We have used 5 years of cyclone storm 
data in Bay of Bengal, Indian Ocean, and Arabic Sea 
range. All the data has been collected from the Indian 
Meteorological Department. Here we used data from 
2001 to 2005 in these three ranges. For the prediction 
we have used K-Nearest Neighbor algorithm and got 
the results with 88% accuracy. 


2. DATA 
2.1 DATA COLLECTION 


In this paper all the data has been collected from 
Indian Meteorological Department, Govt. of India. 
We have used 5 years of Cyclone Storm data starting 
from 2001 to 2005. It includes three regions of 
cyclone storm spaces Bay of Bengal, Arabic Sea and 
Indian Ocean. And all the data has been collected 
from stating time of the cyclone storm depression and 
the various states of cyclone storm intensities and 
ends with the weakened state of the cyclone storm to 
a depression. We have totally 703 records for the 
prediction. 


2.2 DATA DESCRIPTION 


In the collected data set there are totally eleven 
parameters available including Name of the Storm, 
Region, Latitude, Longitude, Time, Cl No, Estimated 
Central Pressure, Maximum Sustained Surface Wind, 
Pressure Drop and Grade of the Cyclone Storm. With 
this data set we can conform the place time and 
intensity of the storm. And the Grade class is 
considered as the target of the prediction process 
which means we are going to predict the Grade of the 
Cyclone Storm by giving the other parameters of the 
storm data. For the prediction process only those 
parameters which affect the results will be 
considered, thus we only take three parameters out of 
eleven for the prediction process. Those are 
Estimated Central Pressure, Maximum Sustained 
Surface Wind and Pressure Drop. 


These three parameters are recorded from the 
beginning of the storm depression and then calculated 
every three hours till the cyclone storm loses its 
intensity and weaken into the small depression. The 
Estimated Central Pressure is calculated in the 


measurement called hPa (Hecto Pascal). Maximum 
Sustained Surface Wind is the value of the wind 
speed measured in the surface level in the 
measurement kt(Knots). Pressure Drop is the level of 
pressure level from the surface level which also 
measured in hPa (Hecto Pascal). We have totally 703 
records in the total data set from year 2001 to 2005. 
And also we took three region measurements like 
Indian Ocean, Bay of Bengal and Arabic Sea. Target 
Class or the Grade of the Cyclone Storm has five 
categories included. A Cyclone Storm stars with the 
Depression (D), and then turn into Deep Depression 
(DD), then the actual Cyclone Storm (CS) will start, 
next level will be Severe Cyclone Storm (SCS), and 
the last grade is Cyclone Storm at its at most level 
Very Severe Cyclone Storm (VSCS). 


3. METHOD 


Here K-Nearest Neighbor algorithm is used to predict 
the class of the Cyclone Storm, and reports which 
class the particular Cyclone Storm belongs to using 
the following methodology. 


3.1. K-Nearest Neighbor (K-NN) 


Yakowitz extended the K-nearest neighbor method 
constructing a robust theoretical base for it and 
introduced it into the successful forecast in the 
hydrological research. K-nearest neighbor method is 
applied to recognize the Grade of the Cyclone Storm 
in this paper. The total data set is divided into five 
classes and these are training dataset and test dataset. 
The total number of dataset in the training class is 
500 records, and we have 101 records from 
Depression class, 121 from Deep Depression class, 
173 records of Cyclone Storm, 62 records of Severe 
Cyclone Storm and lastly 43 records of Very Severe 
Cyclone Storm. In the test data set we have 203 
records available. Test data set includes 56 records of 
Depression class, 76 records of Deep Depression 
class, 51 records of Cyclone Storm, five records of 
Severe Cyclone Storm and 15 records of Very Severe 
Cyclone Storm. The training data set is arranged 
consecutively by D, DD, CS, SCS, and VSCS class 
data vector. The similarity measure has been taken 
between each data vector of test set with each data 
vector of training set. Similarity between training and 
test observation vectors say, 

D = (pl, p2,..5 DD, 1 = (GL, Qyrcees qy) is 
defined as 
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The similarity measures between two vectors reflect 
the cosine of the angles between them. The similarity 
is more if the angle is smaller. The similarity measure 
indicates vicinity between the two vectors (one test 
vector and one training vector) with each other. The 
cosine angle for each of the test data vector with each 
of the training data vector is determined. These 
cosine angles are arranged in the decreasing order. As 
the number of training data vectors is 500, the 
numbers of cosine angles are also 500. Half numbers 
of cosine angles are considered for analysis at first. 
For each of the Grade Depression vector, if 
maximum number of Depression data appears within 
half of the set of cosine angles then it is to be 
considered as properly classified as Depression class. 
Similar thing happens for other classes as well. Here 
the value of k is 26. 


4. RESULT 


Chart — 1: 


Table — 1: 
Results from the prediction using KNN 
Actual cs |D |DD |Sscs | vscs 
Class | Cs 51 |0 0 0 0 
of the |p 0 55 125 |0 0 
test DD 0 1 51 [0 0 
data Fscs To [0 [0 15 0 
VSCS | 0 0 0 0 15 


The above results have been obtained from using 
KNN algorithm in the R platform. R is a 
programming language and software environment for 
Statistical computing and graphics. R was created by 
Ross Thaka and Robert Gentleman at the University 
of Auckland, New Zealand, and is_ currently 
developed by the R Development Core Team, of 
which Chambers is a member. R is named partly after 
the first names of the first two R authors and partly as 
a play on the name of S. R is a GNU project. The 
source code for the R software environment is written 
primarily in C, Fortran, and R. R is freely available 
under the GNU General Public License, and pre- 
compiled binary versions are provided for various 
operating systems. R uses a command line interface; 
there are also several graphical front-ends for it. 


Form the observation of results from KNN algorithm 
more than 88% of test data have been classified 
correctly and the other 12% of misclassification is 
happen in the Depression and Deep Depression class 
only and the other classes have the perfect 
classification without any misclassified data set. And 
in this paper we have taken the value of k=26 which 
is the square root of the total data records available. 
By doing this we are obtaining the most accurate 
results compared to the other values to the k. 
Chakrabarty! et. al., in 2013 applied K-nn technique 
for the prediction of severe thunderstorm having 
around 12 hours lead time. They used two types of 
weather parameters such as moisture difference and 
dry adiabatic lapse rate. These parameters are 
considered from the surface level up to the five 
different geopotential layers of the upper air. So there 
are 10 weather parameters. They got only 55.55% of 
the “squall storm“ days which are properly classified. 
When they applied modified KNN technique (where 
k = 3) they obtained more than 87% accurate 
classification of the “squall storm“ days and more 
than 71% accurate classification for “no storm“ days. 
But in this paper we have used the KNN method 
(where k=26) are used on the three weather variables 
Estimated Central Pressure, Maximum Sustained 
Surface Wind and Pressure Drop. 


5. DISCUSSION AND CONCLUSION 


The challenge that has been undertaken for this 
forecasting work is the proper selection of the 
machine learning technique to get accurate prediction 
using only the three types of input weather variables: 
Estimated Central Pressure, Maximum Sustained 
Surface Wind and Pressure Drop. The results of the 
model shows the result of nearly 88% of data to be 
classified correctly and it only have the error rate of 
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the 12% which is only occur in Deep Depression and 
Depression classes only which is the starting states of 
the cyclone storm. And we can clearly predict the 
cyclone storm intensity which can be used to warn 
the people before to avoid the destructive events. 
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